HR Analytics Employee Attrition and Performance

BCon 147: BCon 147 Midterm Exercise

Author

Cherie Joyce L. Bongcaras

Published

October 25, 2024

1 Project overiew

In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.

2 Scenario

Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.

Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.

3 Understanding data source

The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.

This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.

## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |> 
  DT::datatable()

4 Data wrangling and management

Libraries

Task: Load the necessary libraries

Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.

# load all your libraries here
library(dplyr)
library(readxl)
library(DT)
library(janitor)
library(ggplot2)
library(scales)
library(forcats)
library(sjPlot)
library(report)
library(ggstatsplot)

4.1 Data importation

Task 4.1. Merging dataset
  • Import the two dataset Employee.csv and PerformanceRating.csv. Save the Employee.csv as employee_dta and PerformanceRating.csv as perf_rating_dta.

  • Merge the two dataset using the left_join function from dplyr. Use the EmployeeID variable as the varible to join by. You may read more information about the left_join function here.

  • Save the merged dataset as hr_perf_dta and display the dataset using the datatable function from DT package.

## import the two data here
employee_dta <- read.csv("~/Special Topics_R Studio/midterm-bcon147-project-exercise/dataset/Employee.csv")
perf_rating_dta <- read.csv("~/Special Topics_R Studio/midterm-bcon147-project-exercise/dataset/PerformanceRating.csv")

## merge employee_dta and perf_rating_dta using left_join function.
## save the merged dataset as hr_perf_dta
hr_perf_dta <- left_join(employee_dta, perf_rating_dta, by = "EmployeeID")

## Use the datatable from DT package to display the merged dataset
datatable(hr_perf_dta)

4.2 Data management

Task 4.2. Standardizing variable names
  • Using the clean_names function from janitor package, standardize the variable names by using the recommended naming of variables.

  • Save the renamed variables as hr_perf_dta to update the dataset.

## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <- hr_perf_dta |> 
  clean_names()

## display the renamed hr_perf_dta using datatable function
datatable(hr_perf_dta)
Task 4.2. Recode data entries
  • Create a new variable cat_education wherein education is 1 = No formal education; 2 = High school; 3 = Bachelor; 4 = Masters; 5 = Doctorate. Use the case_when function to accomplish this task.

  • Similarly, create new variables cat_envi_sat, cat_job_sat, and cat_relation_sat for environment_satisfaction, job_satisfaction, and relationship_satisfaction, respectively. Re-code the values accordingly as 1 = Very dissatisfied; 2 = Dissatisfied; 3 = Neutral; 4 = Satisfied; and 5 = Very satisfied.

  • Create new variables cat_work_life_balance, cat_self_rating, cat_manager_rating for work_life_balance, self_rating, and manager_rating, respectively. Re-code accordingly as 1 = Unacceptable; 2 = Needs improvement; 3 = Meets expectation; 4 = Exceeds expectation; and 5 = Above and beyond.

  • Create a new variable bi_attrition by transforming attrition variable as a numeric variabe. Re-code accordingly as No = 0, and Yes = 1.

  • Save all the changes in the hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.

## create cat_education
hr_perf_dta <- hr_perf_dta |> 
  mutate(cat_education = case_when(
    education == 1 ~ "No formal education",
    education == 2 ~ "High school",
    education == 3 ~ "Bachelor",
    education == 4 ~ "Masters",
    education == 5 ~ "Doctorate",
    TRUE ~ NA_character_  
  ))

## create cat_envi_sat,  cat_job_sat, and cat_relation_sat
hr_perf_dta <- hr_perf_dta  |> 
  mutate(cat_envi_sat = factor(environment_satisfaction, levels = 1:5, labels = c("Very dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very satisfied")),
         cat_job_sat = factor(job_satisfaction, levels = 1:5, labels = c("Very dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very satisfied")),
         cat_relation_sat = factor(relationship_satisfaction, levels = 1:5, labels = c("Very dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very satisfied")))

## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
hr_perf_dta <- hr_perf_dta  |> 
  mutate(cat_work_life_balance = factor(work_life_balance, levels = 1:5, labels = c("Unacceptable", "Needs improvement", "Meets expectation", "Exceeds expectation", "Above and beyond")),
         cat_self_rating = factor(self_rating, levels = 1:5, labels = c("Unacceptable", "Needs improvement", "Meets expectation", "Exceeds expectation", "Above and beyond")),
         cat_manager_rating = factor(manager_rating, levels = 1:5, labels = c("Unacceptable", "Needs improvement", "Meets expectation", "Exceeds expectation", "Above and beyond")))

## create bi_attrition
hr_perf_dta <- hr_perf_dta  |> 
  mutate(bi_attrition = ifelse(attrition == "Yes", 1, 0))

## print the updated hr_perf_dta using datatable function
datatable(hr_perf_dta)

5 Exploratory data analysis

5.1 Descriptive statistics of employee attrition

Task 5.1. Breakdown of attrition by key variables
  • Select the variables attrition, job_role, department, age, salary, job_satisfaction, and work_life_balance. Save as attrition_key_var_dta.

  • Compute and plot the attrition rate across job_role, department, and age, salary, job_satisfaction, and work_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use the count function to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation as pct_attrition. Do not forget to ungroup before storing the output. Store the output as attrition_rate_job_role.

  • Plot for the attrition rate across job_role has been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!

## selecting attrition key variables and save as `attrition_key_var_dta`
attrition_key_var_dta <- hr_perf_dta  |> 
  select(attrition, job_role, department, age, salary, job_satisfaction, work_life_balance)

## compute the attrition rate across job_role and save as attrition_rate_job_role
attrition_rate_job_role <- attrition_key_var_dta  |> 
  group_by(job_role)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_job_role
print(attrition_rate_job_role)
# A tibble: 13 × 3
   job_role                      n pct_attrition
   <chr>                     <int>         <dbl>
 1 Analytics Manager           213         3.09 
 2 Data Scientist             1387        20.1  
 3 Engineering Manager         307         4.45 
 4 HR Business Partner          25         0.362
 5 HR Executive                119         1.72 
 6 HR Manager                   17         0.246
 7 Machine Learning Engineer   582         8.44 
 8 Manager                     145         2.10 
 9 Recruiter                   152         2.20 
10 Sales Executive            1567        22.7  
11 Sales Representative        500         7.25 
12 Senior Software Engineer    512         7.42 
13 Software Engineer          1373        19.9  
## compute the attrition rate across department and save as attrition_rate_department
attrition_rate_department <- attrition_key_var_dta  |> 
  group_by(department)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_department
print(attrition_rate_department)
# A tibble: 3 × 3
  department          n pct_attrition
  <chr>           <int>         <dbl>
1 Human Resources   313          4.54
2 Sales            2211         32.0 
3 Technology       4375         63.4 
## compute the attrition rate across age and save as attrition_rate_age
attrition_rate_age <- attrition_key_var_dta  |> 
  group_by(age)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_age
print(attrition_rate_age)
# A tibble: 34 × 3
     age     n pct_attrition
   <int> <int>         <dbl>
 1    18    58         0.841
 2    19   119         1.72 
 3    20   149         2.16 
 4    21   254         3.68 
 5    22   324         4.70 
 6    23   264         3.83 
 7    24   472         6.84 
 8    25   597         8.65 
 9    26   545         7.90 
10    27   412         5.97 
# ℹ 24 more rows
## compute the attrition rate across age and save as attrition_rate_salary
attrition_rate_salary <- attrition_key_var_dta  |> 
  group_by(salary)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_salary
print(attrition_rate_salary)
# A tibble: 1,455 × 3
   salary     n pct_attrition
    <int> <int>         <dbl>
 1  20387    10        0.145 
 2  20418     1        0.0145
 3  20526     1        0.0145
 4  20583     1        0.0145
 5  20650    10        0.145 
 6  20778     1        0.0145
 7  20802     1        0.0145
 8  21026     1        0.0145
 9  21158     1        0.0145
10  21202     1        0.0145
# ℹ 1,445 more rows
## compute the attrition rate across age and save as attrition_job_satisfaction
attrition_rate_job_satisfaction <- attrition_key_var_dta  |> 
  group_by(job_satisfaction)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_job_satisfaction
print(attrition_rate_job_satisfaction)
# A tibble: 6 × 3
  job_satisfaction     n pct_attrition
             <int> <int>         <dbl>
1                1   130          1.88
2                2  1674         24.3 
3                3  1651         23.9 
4                4  1685         24.4 
5                5  1569         22.7 
6               NA   190          2.75
## compute the attrition rate across age and save as attrition_work_life_balance
attrition_rate_work_life_balance <- attrition_key_var_dta  |> 
  group_by(work_life_balance)  |> 
  count()  |> 
  ungroup() |> 
  mutate(pct_attrition = n / sum(n) * 100)

## print attrition_rate_work_life_balance
print(attrition_rate_work_life_balance)
# A tibble: 6 × 3
  work_life_balance     n pct_attrition
              <int> <int>         <dbl>
1                 1   121          1.75
2                 2  1702         24.7 
3                 3  1670         24.2 
4                 4  1706         24.7 
5                 5  1510         21.9 
6                NA   190          2.75
## Plot the attrition rate by job role
attrition_rate_job_role |> 
  mutate(job_role = fct_reorder(job_role, pct_attrition)) |>  
  ggplot(aes(x = job_role, y = pct_attrition, fill = pct_attrition)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  
  scale_fill_gradient(low = "lightgreen", high = "darkgreen") +  
  labs(title = "Attrition Rate by Job Role", x = "Job Role", y = "Attrition Rate (%)") +
  theme_minimal(base_size = 14) +  
  coord_flip() +  
  theme(
  legend.position = "none",
  plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
  axis.title = element_text(size = 12),
  axis.text = element_text(size = 10),
  panel.grid.major.y = element_blank(),  
  panel.grid.minor = element_blank())  

## Plot the attrition rate by department
attrition_rate_department |> 
  mutate(department = fct_reorder(department, pct_attrition)) |>  
  ggplot(aes(x = pct_attrition, y = department, fill = pct_attrition)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  
  scale_fill_gradient(low = "lightgreen", high = "darkgreen") +  
  labs(title = "Attrition Rate by Department", 
       x = "Attrition Rate (%)", 
       y = "Department") +
  theme_minimal(base_size = 14) +  
  coord_flip() +  
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    panel.grid.major.x = element_blank(),  
    panel.grid.minor = element_blank())  

## Plot the attrition rate by age
# Group age into 5-year intervals
attrition_rate_age <- attrition_rate_age |> 
  mutate(age_group = cut(age, 
                         breaks = seq(15, 60, by = 5),  
                         labels = paste0(seq(15, 55, by = 5), "-", seq(19, 59, by = 5)),  
                         right = FALSE))

# Reorder age groups for better readability
attrition_rate_age <- attrition_rate_age |> 
  filter(!is.na(age_group) & !is.na(pct_attrition)) |> 
  mutate(age_group = factor(age_group, levels = rev(levels(age_group))))  

# Plot attrition rate by age group
attrition_rate_age |> 
  ggplot(aes(x = age_group, y = pct_attrition, fill = pct_attrition)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  
  scale_fill_gradient(low = "lightgreen", high = "darkgreen") +  
  labs(title = "Attrition Rate by Age Group", 
       x = "Age Group", 
       y = "Attrition Rate (%)") +
  theme_minimal(base_size = 14) +  
  coord_flip() +  
  theme(
  legend.position = "none",
  plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
  axis.title = element_text(size = 12),
  axis.text = element_text(size = 10),
  panel.grid.major.y = element_blank(),  
  panel.grid.minor = element_blank())  

## Plot the attrition rate by salary

# Group salary into 100,000 intervals
attrition_rate_salary <- attrition_rate_salary |> 
  mutate(salary_group = cut(salary, 
                            breaks = seq(0, max(salary, na.rm = TRUE), by = 100000), 
                            right = FALSE,
                            labels = paste0(scales::comma(seq(0, max(salary, na.rm = TRUE) - 100000, by = 100000) +1),
                                            "-",
                                            scales::comma(seq(100000, max(salary, na.rm = TRUE), by = 100000))))) |> 
  filter(!is.na(salary_group) & !is.na(pct_attrition))

# Plot attrition rate by salary group
attrition_rate_salary |> 
  ggplot(aes(x = pct_attrition, y = salary_group, fill = pct_attrition)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  
  scale_fill_gradient(low = "lightgreen", high = "darkgreen") +  
  labs(title = "Attrition Rate by Salary Group", 
       x = "Attrition Rate (%)", 
       y = "Salary Range") +
  scale_x_continuous(labels = comma) +  
  theme_minimal(base_size = 14) +  
  coord_flip() +  
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    panel.grid.major.x = element_blank(),  
    panel.grid.minor = element_blank())  

## Plot the attrition rate by job_satisfaction
attrition_rate_job_satisfaction |> 
  filter(!is.na(job_satisfaction) & !is.na(pct_attrition)) |> 
  ggplot(aes(x = factor(job_satisfaction), y = pct_attrition, fill = factor(job_satisfaction))) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("#98FB98",  # Lightest (Pale green)
                              "#3CB371",    # Sea green 
                              "#2E8B57",    # Medium sea green
                              "#1B4D3E",    # Darkest (Forest green)
                              "#90EE90")) + # Light green 
  labs(
    title = "Attrition Rate by Job Satisfaction",
    x = "Job Satisfaction",
    y = "Attrition Rate (%)"
  ) +
  theme_minimal() +
  coord_flip() +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    panel.grid.major.y = element_blank(),  
    panel.grid.minor = element_blank())  

## Plot the attrition rate by work_life_balance
attrition_rate_work_life_balance |> 
  filter(!is.na(work_life_balance) & !is.na(pct_attrition)) |>
  ggplot(aes(x = factor(work_life_balance), y = pct_attrition, fill = factor(work_life_balance))) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("#98FB98",  # Lightest (Pale green)
                              "#2E8B57",    # Sea green
                              "#3CB371",    # Medium sea green
                              "#1B4D3E",    # Darkest (Forest green)
                              "#90EE90")) + # Light green 
  labs(
    title = "Attrition Rate by Work Life Balance",
    x = "Work Life Balance",
    y = "Attrition Rate (%)"
  ) +
  theme_minimal() +
  coord_flip() +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.text = element_text(size = 10),
    panel.grid.major.y = element_blank(),  
    panel.grid.minor = element_blank())  

5.2 Identifying attrition key drivers using correlation analysis

Task 5.2. Conduct a correlation analysis to identify key drivers
  • Conduct a correlation analysis of key variables: bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, and work_life_balance. Use the cor() function to run the correlation analysis. Remove missing values using the na.omit() before running the correlation analysis. Save the output in hr_corr.

  • Use a correlation matrix or heatmap to visualize the relationship between these variables and attrition. You can use the GGally package and use the ggcorr function to visualize the correlation heatmap. You may explore this site for more information: ggcorr.

  • Discuss which factors seem most correlated with attrition and what that suggests about why employees are leaving.

## conduct correlation of key variables. 
hr_corr_data <- hr_perf_dta  |> 
  select(bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, work_life_balance)

hr_corr_data <- na.omit(hr_corr_data)

hr_corr <- cor(hr_corr_data)

## print hr_corr 

print(hr_corr)
                  bi_attrition       salary years_at_company job_satisfaction
bi_attrition       1.000000000 -0.211181478    -0.6896527798     0.0132368129
salary            -0.211181478  1.000000000     0.2206442116     0.0053054850
years_at_company  -0.689652780  0.220644212     1.0000000000     0.0008700583
job_satisfaction   0.013236813  0.005305485     0.0008700583     1.0000000000
manager_rating    -0.007654429 -0.001596736     0.0178656879    -0.0158205481
work_life_balance  0.003428836 -0.001517145     0.0079339508     0.0417242942
                  manager_rating work_life_balance
bi_attrition        -0.007654429       0.003428836
salary              -0.001596736      -0.001517145
years_at_company     0.017865688       0.007933951
job_satisfaction    -0.015820548       0.041724294
manager_rating       1.000000000       0.007996938
work_life_balance    0.007996938       1.000000000
## install GGally package and use ggcorr function to visualize the correlation

library(GGally)

ggcorr(hr_corr_data, label = TRUE, label_alpha = TRUE, low = "lightgreen", high = "forestgreen", digits = 4, name = "Correlation") +
  theme_minimal() + 
  theme(
    plot.title = element_text(hjust = 0.7, size = 14, face = "bold"),
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),      
    axis.text = element_text(size = 10),                               
    legend.position = "right") +
  labs(title = "Correlation Matrix", x = NULL, y = NULL)

Discussion:

From the correlation matrix and heatmap, the factors most correlated with employee attrition (bi_attrition) are:

  1. Years at Company (-0.69): This is the most significant factor correlated with attrition, and the strong negative correlation suggests that employees who have been with the company for longer periods are much less likely to leave. This could indicate that employees who have established tenure in the company are more loyal or satisfied, or it might reflect that newer employees are more likely to leave due to unmet expectations or lack of adjustment to the company culture.

  2. Salary (-0.21): Salary has a moderate negative correlation with attrition, meaning employees with higher salaries tend to leave less often. This suggests that compensation plays a role in retention—those who are paid well may feel more valued or financially secure, reducing the likelihood that they seek opportunities elsewhere. Conversely, lower-paid employees might be more motivated to look for better-paying jobs.

  3. Job Satisfaction (0.01): The almost negligible positive correlation between job satisfaction and attrition indicates little to no direct relationship. This suggests that factors other than job satisfaction (such as salary or tenure) might be more influential in determining whether employees leave. While surprising, it may reflect that employees are not necessarily leaving because they are dissatisfied with their jobs.

  4. Manager Rating (-0.007) and Work-Life Balance (0.003): These factors show almost no correlation with attrition, suggesting that neither how employees rate their managers nor their perceptions of work-life balance seem to directly influence their decision to stay or leave. This might imply that employees’ decisions to leave are more influenced by structural factors like salary and job tenure rather than personal satisfaction with these elements.

5.3 Predictive modeling for attrition

Task 5.3. Predictive modeling for attrition
  • Create a logistic regression model to predict employee attrition using the following variables: salary, years_at_company, job_satisfaction, manager_rating, and work_life_balance. Save the model as hr_attrition_glm_model. Print the summary of the model using the summary function.

  • Install the sjPlot package and use the tab_model function to display the summary of the model. You may read the documentation here on how to customize your model summary.

  • Also, use the plot_model function to visualize the model coefficients. You may read the documentation here on how to customize your model visualization.

  • Discuss the results of the logistic regression model and what they suggest about the factors that contribute to employee attrition.

## run a logistic regression model to predict employee attrition
## save the model as hr_attrition_glm_model

hr_attrition_glm_model <- glm(
  bi_attrition ~ salary + years_at_company + job_satisfaction + manager_rating + work_life_balance, 
  data = hr_corr_data,       
  family = binomial       
)

## print the summary of the model using the summary function

summary(hr_attrition_glm_model)

Call:
glm(formula = bi_attrition ~ salary + years_at_company + job_satisfaction + 
    manager_rating + work_life_balance, family = binomial, data = hr_corr_data)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        2.571e+00  2.173e-01  11.831   <2e-16 ***
salary            -3.633e-06  4.086e-07  -8.893   <2e-16 ***
years_at_company  -6.333e-01  1.476e-02 -42.919   <2e-16 ***
job_satisfaction   3.470e-02  3.186e-02   1.089    0.276    
manager_rating     5.071e-03  3.810e-02   0.133    0.894    
work_life_balance  2.587e-02  3.198e-02   0.809    0.419    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8574.5  on 6708  degrees of freedom
Residual deviance: 4781.6  on 6703  degrees of freedom
AIC: 4793.6

Number of Fisher Scoring iterations: 5
## install sjPlot package and use tab_model function to display the summary of the model

tab_model(hr_attrition_glm_model)
  bi attrition
Predictors Odds Ratios CI p
(Intercept) 13.08 8.56 – 20.07 <0.001
salary 1.00 1.00 – 1.00 <0.001
years at company 0.53 0.52 – 0.55 <0.001
job satisfaction 1.04 0.97 – 1.10 0.276
manager rating 1.01 0.93 – 1.08 0.894
work life balance 1.03 0.96 – 1.09 0.419
Observations 6709
R2 Tjur 0.502
## use plot_model function to visualize the model coefficients

plot_model(hr_attrition_glm_model, 
           show.values = TRUE,          
           value.offset = .3,           
           vline.color = "black",       
           value.size = 4,              
           title = "Odds Ratios for Employee Attrition Model",  
           axis.labels = c("Work-Life Balance", "Manager Rating", 
                           "Job Satisfaction", "Years at Company", 
                           "Salary"),    
           colors = c("lightgreen", "forestgreen")) +
  theme_minimal() + 
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),  
    axis.text.x = element_text(angle = 0, vjust = 1, hjust = 1),      
    axis.text = element_text(size = 10), 
    axis.title.x = element_text(size = 12, face = "bold"),  
    axis.title.y = element_text(size = 12, face = "bold")   
  )

Discussion:

The table and plot summarize a logistic regression analysis on employee attrition, focusing on predictors like salary, years at the company, job satisfaction, manager rating, and work-life balance.

Key Insights:

  1. Years at Company: Has a significant negative effect on attrition (OR = 0.53), meaning employees with longer tenures are much less likely to leave.

  2. Salary: Is statistically significant (p < 0.001), but the odds ratio of 1.00 indicates it has minimal impact in practical terms on attrition.

  3. Job Satisfaction, Manager Rating, Work-life Balance: These factors are not significant as their p-values are above 0.05 and their odds ratios hover around 1, showing little correlation with employee attrition.

Discussion:

The significant predictors are salary and years at the company. Despite salary being statistically significant, the practical impact is minimal as the odds ratio is essentially 1. This suggests that salary alone does not have much influence on whether an employee leaves the company, despite the significance level. On the other hand, years at the company is highly predictive of retention. The odds of leaving decrease substantially as tenure increases, indicating that the longer employees stay, the more committed they may become to the organization.

The non-significant predictors (job satisfaction, manager rating, work-life balance) show weak or negligible relationships with attrition. Despite these factors being considered important for employee satisfaction and well-being, they do not show up as strong indicators of attrition in this specific model, which could imply that other factors (not captured here) are more crucial in employees’ decisions to leave. Alternatively, these factors may impact attrition indirectly or need further refinement in how they are measured.

5.4 Analysis of compensation and turnover

Task 5.4. Analyzing compensation and turnover
  • Compare the average monthly income of employees who left the company (bi_attrition = 1) and those who stayed (bi_attrition = 0). Use the t.test function to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable called attrition_ttest_results.

  • Install the report package and use the report function to generate a report of the t-test results.

  • Install the ggstatsplot package and use the ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map the bi_attrition variable to the x argument and the salary variable to the y argument.

  • Visualize the salary variable for employees who left and those who stayed using geom_histogram with geom_freqpoly. Make sure to facet the plot by the bi_attrition variable and apply alpha on the histogram plot.

  • Provide recommendations on whether revising compensation policies could be an effective retention strategy.

## compare the average monthly income of employees who left and those who stayed

attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)

## print the results of the t-test

print(attrition_ttest_results)

    Welch Two Sample t-test

data:  salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1 
      125007.26        81956.76 
## install the report package and use the report function to generate a report of the t-test results

report(attrition_ttest_results)
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed

ggbetweenstats(
  data = hr_perf_dta, 
  x = bi_attrition, 
  y = salary, 
  title = "Salary Comparison Between Employees Who Stayed and Left",
  xlab = "Attrition Status (0 = Stayed, 1 = Left)",
  ylab = "Monthly Salary",
  messages = FALSE,
  plot.type = "violin",  
  mean.ci = TRUE,
  ggtheme = ggplot2::theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),  
      axis.text = element_text(size = 10)
    )
)

# create histogram and frequency polygon of salary for employees who left and those who stayed

ggplot(hr_perf_dta, aes(x = salary, fill = factor(bi_attrition))) +
  geom_histogram(alpha = 0.5, position = "identity", bins = 30, color = "white") +
  geom_freqpoly(aes(y = ..density..), color = "black", size = 1.2, bins = 30) +  
  facet_wrap(~ bi_attrition) +
  scale_fill_manual(values = c("forestgreen", "seagreen"), 
                    labels = c("Stayed", "Left")) +  
  labs(title = "Salary Distribution by Attrition Status",
       x = "Monthly Income", 
       y = "Count",
       fill = "Attrition Status") +  
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),  
    axis.text = element_text(size = 10),
    legend.position = "top",  
    legend.title = element_text(face = "bold"),  
    panel.grid.minor = element_blank()  
  )

Discussion:

The provided analysis and visualizations reveal significant insights into the relationship between salary and employee attrition. The t-test results indicate a significant difference in salary between employees who stayed (mean salary of $125,000) and those who left (mean salary of $81,956). The violin plot and salary distribution histograms further emphasize this disparity, showing that employees with lower salaries are more likely to leave.

5.5 Employee satisfaction and performance analysis

Task 5.5. Analyzing employee satisfaction and performance
  • Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed. Use the group_by and count functions to calculate the average performance ratings for each group.

  • Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot. Use the ggplot function to create the plot and map the SelfRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Similarly, visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot. Make sure to map the ManagerRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition. Use the geom_boxplot function to create the plot and map the salary variable to the x argument, the job_satisfaction variable to the y argument, and the bi_attrition variable to the fill argument. You need to transform the job_satisfaction and bi_attrition variables into factors before creating the plot or within the ggplot function.

  • Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.

# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.

avg_ratings <- hr_perf_dta |> 
  group_by(bi_attrition) |> 
  summarise(mean_self = mean(self_rating, na.rm = TRUE),
            mean_manager = mean(manager_rating, na.rm = TRUE))
# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.

ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), fill = factor(self_rating))) +
  geom_bar(position = "dodge", alpha = 0.8) +  
  geom_text(stat = "count", aes(label = ..count..), 
            position = position_dodge(0.9), vjust = 2, size = 4, color = "white") +  
  scale_fill_manual(values = c("seagreen", "mediumseagreen", "forestgreen"), name = "Self Rating") + 
  scale_x_discrete(labels = c("Stayed", "Left")) +  
  labs(title = "Distribution of Self Ratings by Attrition Status", 
       x = "Attrition Status", 
       y = "Count") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.position = "top",  
    panel.grid.minor = element_blank()  
  )

# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.

ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), fill = factor(manager_rating))) +
  geom_bar(position = "dodge", alpha = 0.8) +  
  geom_text(stat = "count", aes(label = ..count..), 
            position = position_dodge(0.9), vjust = 2, size = 4, color = "white") +  
  scale_fill_manual(values = c("lightgreen", "mediumseagreen", "seagreen", "forestgreen"), name = "Self Rating") + 
  scale_x_discrete(labels = c("Stayed", "Left")) +  
  labs(title = "Distribution of Manager Ratings by Attrition Status", 
       x = "Attrition Status", 
       y = "Count") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.position = "top",  
    panel.grid.minor = element_blank()  
  )

# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.

ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), y = salary, fill = factor(job_satisfaction))) +
  geom_boxplot(alpha = 0.8) +  
  scale_fill_manual(values = c("palegreen", "lightgreen", "mediumseagreen", "seagreen", "forestgreen"), name = "Job Satisfaction") +  
  labs(title = "Salary by Job Satisfaction and Attrition", 
       x = "Attrition Status", 
       y = "Salary") +
  scale_x_discrete(labels = c("Stayed", "Left")) +  
  theme_minimal() + 
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.position = "top",  
    panel.grid.minor = element_blank()  
  )

Discussion:

In the box plot titled “Salary by Job Satisfaction and Attrition”, we can observe several key trends and comparisons between employees who stayed (labeled “Stayed”) and those who left (labeled “Left”). The salary distribution is divided by job satisfaction levels (ranging from 1 to 5, where 5 indicates the highest satisfaction).

Observations:

1. Salary Distribution Among Employees Who Stayed:

  1. The salary box plots for those who stayed show a wider range, especially for higher satisfaction levels (4 and 5). The median salary generally increases with higher job satisfaction.

  2. Employees with a job satisfaction level of 4 or 5 seem to have higher median salaries compared to those with lower satisfaction, suggesting that employees who are more satisfied might be compensated better.

2. Salary Distribution Among Employees Who Left:

  1. In contrast, the salary distributions for employees who left show less variation, with narrower box plots. Across all job satisfaction levels (1 to 5), the median salaries appear relatively lower compared to the employees who stayed.

  2. There is less of a relationship between salary and job satisfaction for those who left, indicating that salary alone might not have been a key factor in their decision to leave.

3. Comparison Between Stayed and Left:

  1. Employees who stayed generally have higher salaries, especially at higher levels of job satisfaction. This could indicate that companies are rewarding more satisfied employees with better pay.

  2. For employees who left, salary seems less influenced by job satisfaction. This may suggest that other factors beyond salary and job satisfaction (e.g., work-life balance, company culture) could be driving attrition.

5.6 Work-life balance and retention strategies

Task 5.6. Analyzing work-life balance and retention strategies

At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:

  • Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.

  • Use visualizations to show the differences.

  • Assess whether employees with poor work-life balance are more likely to leave.

You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.

# create a boxplot of salary by work_life_balance and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.

ggplot(na.omit(hr_perf_dta), aes(x = factor(work_life_balance), fill = factor(bi_attrition))) +
  geom_bar(position = "dodge") +
  labs(
    title = "Distribution of Work-Life Balance Ratings by Attrition Status",
    x = "Work-Life Balance Rating (1 = Unacceptable, 5 = Above and Beyond)",
    y = "Count",
    fill = "Attrition Status\n(0 = Stayed, 1 = Left)"
  ) +
  scale_fill_manual(values = c("lightgreen", "seagreen")) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.text = element_text(size = 10)
  )

Discussion:

The bar plot compares the Work-Life Balance ratings between employees who stayed (Attrition = 0) and those who left (Attrition = 1). Here’s a breakdown of the findings:

Observations:

1. Lower Work-Life Balance Ratings (1 and 2):

  1. A small number of employees received the lowest Work-Life Balance rating (1), with very few leaving.

  2. For a rating of 2, we observe a significant number of employees who left. The dark green bars representing employees who left make up a considerable portion of the total for this rating. This suggests that employees with a lower work-life balance are more likely to leave.

2. Moderate Work-Life Balance Ratings (3 and 4):

  1. For ratings of 3 and 4, there is still a noticeable portion of employees leaving, although the proportion is smaller than for rating 2.

  2. The majority of employees with these ratings stayed, but there’s still a relatively high attrition rate.

3. Higher Work-Life Balance Rating (5):

  1. For rating 5 (above and beyond), the number of employees who stayed is significantly higher, with fewer employees leaving.

  2. This suggests that employees who rate their work-life balance highly are more likely to stay, reinforcing the importance of work-life balance in employee retention.

5.7 Recommendations for HR interventions

Task 5.7. Recommendations for HR interventions

Based on the analysis conducted, provide recommendations for HR interventions that could help reduce employee attrition and improve overall employee satisfaction and performance. You may use the following question as guide for your recommendations and discussions.

  • What are the key factors contributing to employee attrition in the company?

  • Which factors are most strongly correlated with attrition?

  • What strategies could be implemented to improve employee retention and satisfaction?

  • How can HR leverage the insights from the analysis to develop effective retention strategies?

  • What are the potential benefits of implementing these strategies for the company?

Recommendations:

Based on the analysis of employee data, here are actionable HR interventions to address the key factors contributing to employee attrition, improve retention, and enhance overall satisfaction and performance:

1. Key Factors Contributing to Employee Attrition

From the analysis, several key factors emerged as significant contributors to employee attrition:

  1. Work-Life Balance: Employees with poor work-life balance are much more likely to leave, as indicated by the higher attrition rates for those with lower Work-Life Balance ratings.

  2. Salary: Compensation is a crucial factor. Lower salary levels are associated with a higher probability of leaving.

  3. Job Satisfaction: Employees who reported low job satisfaction tended to leave at higher rates.

  4. Years at Company: Employees with fewer years of tenure are more likely to leave, indicating a retention issue early in employment.

2. Factors Most Strongly Correlated with Attrition

  1. Salary: Lower salaries were significantly correlated with higher attrition rates.

  2. Years at Company: Shorter tenure was strongly associated with higher attrition rates, suggesting issues with early-stage retention.

  3. Work-Life Balance: Poor work-life balance ratings were strongly linked to employee departures.

  4. Job Satisfaction: Lower job satisfaction is correlated with attrition, but the effect is somewhat less pronounced than salary or work-life balance.

3. Strategies to Improve Employee Retention and Satisfaction

A. Improve Work-Life Balance:

  1. Flexible Work Arrangements: Offer flexible work hours, hybrid or remote work options, and encourage time off to reduce burnout.

  2. Workload Management: Regularly assess workloads to ensure that employees aren’t overwhelmed, especially in roles with high attrition rates.

B. Review and Adjust Compensation Packages:

  1. Competitive Pay: Ensure salaries are competitive within the industry, especially for roles with higher attrition rates. This could involve offering performance bonuses or financial incentives tied to tenure.

  2. Salary Increases Based on Tenure: Implement incremental salary increases based on years at the company to retain newer employees.

C. Increase Job Satisfaction and Career Development:

  1. Career Advancement Opportunities: Develop clear pathways for career growth, including leadership development programs and internal promotions, which can increase job satisfaction.

  2. Skill Development: Offer regular training and professional development opportunities to help employees grow and feel valued.

  3. Regular Feedback and Recognition: Foster a culture of recognition, where employees feel appreciated for their work. Frequent feedback loops can also help employees improve and feel engaged.

D. Strengthen Onboarding and Early Retention Programs:

  1. Enhanced Onboarding: Improve the onboarding experience by setting clear expectations, providing early career support, and fostering a sense of community.

  2. Mentorship Programs: Pair new employees with mentors to help them navigate the company culture and develop stronger ties, reducing the risk of early attrition.

4. Leveraging Insights for Effective Retention Strategies

  1. Data-Driven Retention Programs: Use predictive modeling to identify employees at risk of leaving based on key factors like job satisfaction, salary, and work-life balance, allowing HR to intervene proactively.

  2. Customizing Retention Programs: Tailor interventions based on employee segments (e.g., recent hires, mid-career employees) since attrition factors vary across groups.

  3. Exit Interviews and Surveys: Conduct regular employee surveys and exit interviews to continuously gather insights on reasons for leaving and areas for improvement.

5. Potential Benefits for the Company

  1. Reduced Turnover Costs: Lower attrition reduces recruitment, training, and onboarding costs associated with hiring new employees.

  2. Improved Employee Morale: When employees feel supported, with better work-life balance, competitive pay, and opportunities for growth, overall morale and job satisfaction will improve.

  3. Enhanced Productivity: Satisfied employees are more engaged and productive, leading to higher output and innovation.

  4. Strengthened Employer Brand: Companies that invest in employee well-being and career development will be seen as attractive employers, helping attract top talent and retain key performers.

Conclusion:

The analysis highlights several areas of improvement—work-life balance, salary, job satisfaction, and early retention programs—that, if addressed strategically, can lead to reduced attrition and higher employee engagement. HR should take a proactive approach by implementing targeted interventions that address the needs of both current employees and new hires.